97 research outputs found
Optimizing the CVaR via Sampling
Conditional Value at Risk (CVaR) is a prominent risk measure that is being
used extensively in various domains. We develop a new formula for the gradient
of the CVaR in the form of a conditional expectation. Based on this formula, we
propose a novel sampling-based estimator for the CVaR gradient, in the spirit
of the likelihood-ratio method. We analyze the bias of the estimator, and prove
the convergence of a corresponding stochastic gradient descent algorithm to a
local CVaR optimum. Our method allows to consider CVaR optimization in new
domains. As an example, we consider a reinforcement learning application, and
learn a risk-sensitive controller for the game of Tetris.Comment: To appear in AAAI 201
Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
We consider the off-policy evaluation problem in Markov decision processes
with function approximation. We propose a generalization of the recently
introduced \emph{emphatic temporal differences} (ETD) algorithm
\citep{SuttonMW15}, which encompasses the original ETD(), as well as
several other off-policy evaluation algorithms as special cases. We call this
framework \ETD, where our introduced parameter controls the decay rate
of an importance-sampling term. We study conditions under which the projected
fixed-point equation underlying \ETD\ involves a contraction operator, allowing
us to present the first asymptotic error bounds (bias) for \ETD. Our results
show that the original ETD algorithm always involves a contraction operator,
and its bias is bounded. Moreover, by controlling , our proposed
generalization allows trading-off bias for variance reduction, thereby
achieving a lower total error.Comment: arXiv admin note: text overlap with arXiv:1508.0341
- …